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Abstract: We show how a Hopfield network with modifiable recurrent connec- 
tions undergoing slow Hebbian learning can extract the underlying geometry of 
an input space. First, we use a slow/fast analysis to derive an averaged system 
whose dynamics derives from an energy function and therefore always converges 
to equilibrium points. The equilibria reflect the correlation structure of the in- 
puts, a global object extracted through local recurrent interactions only. Second, 
we use numerical methods to illustrate how learning extracts the hidden geomet- 
rical structure of the inputs. Indeed, multidimensional scaling methods make it 
possible to project the final connectivity matrix on to a distance matrix in a high- 
dimensional space, with the neurons labelled by spatial position within this space. 
The resulting network structure turns out to be roughly convolutional. The resid- 
ual of the projection defines the non-convolutional part of the connectivity which 
is minimized in the process. Finally, we show how restricting the dimension of 
the space where the neurons live gives rise to patterns similar to cortical maps. 
We motivate this using an energy efficiency argument based on wire length min- 
imization. Finally, we show how this approach leads to the emergence of ocular 
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dominance or orientation columns in primary visual cortex. In addition, we es- 
tablish that the non-convolutional (or long-range) connectivity is patchy, and is 
co-aligned in the case of orientation learning. 

Keywords: correlation-based Hebbian learning, Hopfield networks, temporal 
averaging, energy minimization, multidimensional scaling, cortical maps 



1 Introduction 

Activity -dependent synaptic plasticity is generally thought to be the basic cellular 
substrate underlying learning and memory in the brain. Donald Hebb QHebb, 1949fl 



postulated that learning is based on the correlated activity of synaptically con- 
nected neurons: if both neurons A and B are active at the same time, then the 
synapses from A to B and B to A should be strengthened proportionally to the 
product of the activity of A and B. However, as it stands, Hebb's learning rule 
diverges. Therefore, various modification of Hebb's rule have been developed, 
which basically take one of three forms (see QGerstner and Kistler, 2002) and 



[Dayan and Abbott, 2001 1): first, a decay term can be added to the learning rule 
so that each synaptic weight is able to "forget" what it previously learned. Sec- 
ond, each synaptic modification can be normalized or projected on different sub- 
spaces. These constraint-based rules may be interpreted as implementing some 
form of competition for energy between dendrites and axons, see [ Miller, 1996 



Miller and MacKay, 1996] and | Ooyen, 2001 1 for details. Third, a sliding thresh 



old mechanism can be added to Hebbian learning. For instance, a post-synaptic 
threshold rule consists in multiplying the presynaptic activity and the subtraction 
of the average postsynaptic activity from its current value, which is referred as 
covariance learning ([ Sejnowski and Tesauro, 1989) ). Probably the best known 



of these rules is the BCM rule flBienenstock et al., 1982] . It should be noted 
that history-based rules can also be defined without changing the qualitative dy- 
namics of the system: instead of considering the instantaneous value of the neu- 
rons' activity, these rules consider its weighted mean over a time window (see 
JFoldiak, 1991j Wallis and Baddeley, 1997 1). Recent experimental evidence sug- 



gests that learning may also depend upon the precise timing of action potentials 
QBi and Poo, 2001] . Contrary to most Hebbian rules that only detect correlations, 



these rules can also encode causal relationships in the patterns of neural activa- 
tion. However, the mathematical treatment of these spike timing dependent rules 
is much more difficult than rate based ones. 



2 



Hebbian-like learning rules have often been studied within the framework 
of unsupervised feedfoward neural networks [ |Oja, 1982] |Bienenstock et al., 1982 



Miller and MacKay, 1996, Dayan and Abbott, 2001 1. They also form the basis of 



most weight-based models of cortical development, assuming fixed lateral con- 
nectivity (e.g. mexican hat) and modifiable vertical connections (see the review 
of QSwindale, 1995| fl In these developmental models, the statistical structure 



of input correlations provides a mechanism for spontaneously breaking some un- 
derlying symmetry of the neuronal receptive fields leading to the emergence of 
feature selectivity. When such correlations are combined with fixed intracorti- 
cal interactions, there is a simultaneous breaking of translation symmetry across 
cortex leading to the formation of a spatially periodic cortical feature map. A re- 
lated mathematical formulation of cortical map formation has been developed in 
QTakeuchi and Amari, 1979[ |Bressloff, 2005| using the theory of self-organizing 



neural fields. Although very irregular, the two-dimensional cortical maps ob- 
served at a given stage of development, can be unfolded in higher dimensions 



to get smoother geometrical structures. Indeed, QBressloff et al., 2001] suggested 
that the network of orientation pinwheels in VI is a direct product between a circle 
for orientation preference and a plane for position, based on a modification of the 



icecube model of Hubel and Wiesel [Hubel and Wiesel, 1977 1. From a more ab 



stract geometrical perspective, Petitot QPetitot, 2003) has associated such a struc- 
ture to a 1-jet space and used this to develop some applications to computer vision. 
More recently, QBressloff and Cowan, 2003] and QChossat and Faugeras, 2009[ have 



considered more complex geometrical structures such as spheres and hyperbolic 
surfaces that incorporate additional stimulus features such as spatial frequency 
and textures, respectively. 

In this paper, we show how geometrical structures related to the distribution 
of inputs can emerge through unsupervised Hebbian learning applied to recur- 
rent connections in a rate-based Hopfield network. Throughout this paper, the 
inputs are presented as an external non-autonomous forcing to the system and not 
an initial condition as is often the case in Hopfield networks. It has previously 
been shown that, in the case of a single fixed input, there exists an energy func- 
tion that describes the joint gradient dynamics of the activity and weight variables 



[Dong and Hopfield, 1992 1. This implies that the system converges to an equilib- 
rium during learning. We use averaging theory to generalize the above result to 
the case of multiple inputs, under the adiabatic assumption that Hebbian learning 



'There have only been a few computational studies that consider the joint development of 
lateral and vertical connections f Bartsch and Van Hemmen, 200 1 Miikkulain en et al., 2005 1 . 
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occurs on a much slower time scale than both the activity dynamics and the sam- 
pling of the input distribution. We then show that the equilibrium distribution of 
weights, when embedded into R fc for sufficiently large integer k, encodes the geo- 
metrical structure of the inputs. Finally, we numerically show that the embedding 
of the weights in two dimensions (k = 2) gives rise to patterns that are quali- 
tatively similar to experimentally observed cortical maps, with the emergence of 
features columns and patchy connectivity. Although the mathematical formalism 
we introduce here could be extended to most of the rate-based Hebbian rules in 
the literature, we present the theory for Hebbian learning with decay because of 
the simplicity of the resulting dynamics. 

Note that the use of geometrical objects to describe the emergence of connec- 
tivity patterns has previously been put forward by Amari in a different context. 
Based on the theory of information geometry, Amari considers the geometry of 
the set of all the networks and defines learning as a trajectory on this manifold for 
perceptron networks in the framework of supervised learning [Amari, 1998) or 
for unsupervised Boltzmann Machines flAmari et al., 1992) . He uses differential 
and Riemannian geometry to describe an object which is at a larger scale than the 
cortical maps this paper is focusing on. 

Moreover, Zucker and colleagues are currently developing a non-linear dimen- 
sionality reduction approach to caracterize the statistics of natural visual stimuli 
(see QLawlor and Zucker, 2010| |Coifman et al., 2005] ). Although they do not use 
learning neural networks and stay closer to the field of computer vision than this 
paper, it turns out their approach is similar to the geometrical embedding approach 
we are using. 

The structure of the paper is as follows. In section |2] we formally introduce 
the model. We derive the averaged system in section |3] which then allows us 
to study the stability of the learning dynamics in the presence of multiple inputs 
by constructing an appropriate energy function. We adress stability in section 
IU In section \5\ we determine the geometrical structure of the equilibrium weight 
distribution and show how it reflects the structure of the inputs. We also relate this 
approach to the emergence of cortical maps. Finally, the results are discussed in 
section [6j 
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2 Model 



2.1 Neural network evolution 

A neural mass corresponds to a mesoscopic coherent group of neurons. It is con- 
venient to consider them as building blocks, first for computational simplicity, 
second for their direct relationship to macroscopic measurements of the brain 
(EEG, MEG and Optical imaging) which average over numerous neurons, and 
third because one can functionally define coherent groups of neurons within cor- 
tical columns. For each neural mass i G {1..N}, define the mean membrane 
potential Vi(t) at time t. The instantaneous population firing rate Ui(t) is linked to 
the membrane potential through the relation Ui(t) = s(Vi(t)) , where s is a smooth 
sigmoid function. In the following, we choose 



where S m , S' m and are respectively the maximal firing rate, the maximal slope 
and the offset of the sigmoid. 

Consider a Hop field network of neural masses described by the equation 



The first term roughly corresponds to the intrinsic dynamics of the neural mass: 
it decays exponentially to zero at a rate a if it receives neither external inputs nor 
spikes from the other neural masses. We will fix the units of time by setting a = 
1. The second term corresponds to the rest of the network sending information 
through spikes to the given neural mass i, with Wy{t) the effective synaptic weight 
from neural mass j. The synaptic weights are time-dependent because they evolve 
according to a continuous time Hebbian learning rule (see below). The third term 
Ii(t) corresponds to an external input to neural mass i, e.g. information extracted 
by the retina or fhalamo-cortical connections. We take the inputs to be piecewise 
constant in time, that is, at regular time intervals a new input is presented to the 
network. In this paper, we will assume that the inputs are chosen by peridodically 
cycling through a given set of M inputs. An alternative approach would be to 
randomly select each input from a given probability distribution [Geman, 1979[ |. 
It is convenient to introduce vector notation by representing the time-dependent 
membrane potentials by V G C 1 (R + ,M N ), the time-dependent external inputs 



(71 



(1) 



0)) 




(2) 
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by / e C°(IR + ,R Ar ), and the time-dependent network weight matrix by W G 
C 1 (M + , M iVxAr ). We can then rewrite the above system of ordinary differential 
equations as a single vector- valued equation 

d X- = -v + W ■ S(V) + I, (3) 

where S : M. N — > M. N corrresponds to the term by term application of the sigmoid 
S, i.e. S(Y)i = sW). 



2.2 Correlation-based Hebbian learning 



The synaptic weights are assumed to evolve according to a correlation-based Heb- 
bian learning rule of the form 



dW„ 



ij 



dt 



e(s(Vi)s(V;-) - /jWi 



13)1 



(4) 



where e is the learning rate, and we have included a decay term in order to stop the 
weights from diverging. In order to rewrite the above equation in a more compact 
vector form, we introduce the tensor (or Kronecker) product S(V) ® S(V) so that 
in component form 



[S(V) <g) S(V)]ij = S(V)iS(V) 



13 ' 



(5) 



where S is treated as a mapping from R to R . The tensor product implements 
Hebb's rule that synaptic modifications involve the product of postynaptic and 
presynaptic firing rates. We can then rewrite the combined voltage and weight 
dynamics as the following non-autonomous (due to time-dependent inputs) dy- 
namical system: 



( dV 

= -V + W ■ S(V) + I 

dt 



S : < 



dW 
~~dT 



e(s(V) ®S(V) -fiWy 



(6) 



Let us make few remarks about the existence and uniqueness of solutions. 
First, boundedness of S implies boundedness of the system S. More precisely, if 
/ is bounded, then the solutions are bounded. To prove this, note that the right 
hand side of the equation for W is the sum of a bounded term and a linear decay 
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term in W. Therefore, W is bounded and hence the term W-S(V) is also bounded. 
The same reasoning applies to V. S being Lipschitz continuous implies that the 
right hand side of the system is Lipschitz. This is sufficient to prove existence 
and uniqueness of the solution by applying the Cauchy-Lipschitz theorem. In the 
following, we will derive an averaged autonomous dynamical system which 
will be well-defined for the same reasons. 



3 Averaging the system 

We will show that system £ can be approximated by an autonomous Cauchy prob- 
lem which will be much more convenient to handle. This averaging method makes 
the most of multiple time-scales in the system. First, it is natural to consider that 
learning occurs on a much slower time-scale than the evolution of the membrane 
potentials (as determined by a), i.e. 

e « 1. (7) 

Second, an additional time-scale arises from the rate at which the inputs are sam- 
pled by the network. That is, the network cycles periodically through M fixed 
inputs, with the period of cycling given by T. It follows that I is T-periodic, 
piecewise constant. We assume that the sampling rate is also much slower than 
the evolution of the membrane potentials, 

% « 1. (8) 

Finally, we assume that the period T is small compared to the time-scale of the 
learning dynamics, 

e « \- (9) 

We can now simplify the system £ by applying Tikhonov's theorem for slow/fast 
systems, and then classical averaging methods for periodic systems. 



3.1 Tikhonov's theorem 



Tikhonov's theorem ( QTikhonov, 1952) and QVerhulst, 2007| for a clear introduc 



tion) deals with slow/fast systems. It says the following: 
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Theorem 3.1. Consider the initial value problem 



x = f(x,y,t), x(0) = x , x e R n ,t G R. 
e?/ = g(x, y, t), y(0) = y , y G R m 



■+ 



Assume that: 



1. A unique solution of the initial value problem exists and we suppose, this 
holds also for the reduced problem 



with solutions x{t), y(t). 

2. The equation = g(x, y, t) is solved by y(t) = <p(x, t), where t) is a 
continuous function and an isolated root. Also suppose that y(t) = <p(x, t) 
is an asymptotically stable solution of the equation ^ = g(x, y, r) that is 
uniform in the parameters x e W 1 and t G M+. 

3. y(0) is contained in an interior subset of the domain of attraction ofy. 
Then we have 

\im t _> x e {t) = x(t), < t < L 
lim e ^ y e {t)=y{t), < d < t < L 

with d and L constants independent of e. 

In order to apply Tikhonov's theorem directly to the system S, we first need 
to rescale time according to t — > et. This gives 



Tikhonov's theorem then implies that solutions of E are close to solutions of the 
reduced system (in the unsealed time variable) 



x = f(x,y,t), x(0) = x 
= g(x,y,t) 



dV 

~'~dt 
dW 



S{V) <g) S(V) - fiW. 



V + W ■ S(V) + / 



dt 




(10) 
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provided that the dynamical systems £ in equation (6), and equation (flOl ) are well 
defined. It is easy to show that both systems are Lipschitz because of the properties 
of S. Following [ Faugeras et al., 2008) , we know that if 



S'J\W\\<1, (11) 

then there exists an isolated root V : M+ — > R N of the equation V = W ■ S(V) + / 
and V is asymptotically stable. Equation (fTTI) corresponds to the weakly con- 
nected case. Moreover, the initial condition belongs to the basin of attraction of 
this single fixed point. Note that we require ^ C 1 so that the membrane po- 
tentials have sufficient time to approach the equilibrium associated with a given 
input before the next input is presented to the network. In fact, this assumption 
make it reasonable to neglect the transient activity dynamics due to the switching 
between inputs. 



3.2 Periodic averaging 

The system given by equation (fTOt corresponds to a differential equation for W 
with T-periodic forcing due to the presence of V on the right-hand side. Since 
T <C e _1 , we can use classical averaging methods to show that solutions of (flOl) 
are close to solutions of the following autonomous system on the time-interval 
[0, -] (which we suppose large because e << 1) 

V(t) = W -S(V(t))+I(t) 



dW / 1 



e 



dt \T 



t . (12) 



S{V{s)) ® S{V{s))ds - nW(t) 



It follows that solutions of £ are also close to solutions of £ - Finding the explicit 
solution V(t) for each input I(t) is difficult and requires fixed points methods, 
e.g. a Picard algorithm. Therefore, we will consider yet another system £' whose 
solutions are also close to £ an d hence £. In order to construct £' we need to 
introduce some additional notation. 

Let us label the M inputs by a — 1, . . . , M and denote by the fixed 
point solution of the equation = W ■ S(V^ a >) + J^. Given the periodic 
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sampling of the inputs, we can rewrite <TT2T > as 



= w ■ 5(yW) + 

M 



dW -1 M (13) 



a=l 

If we now introduce the N x M matrices V and X with components Vi a = V { 
and I ia = l\ a , then we can eliminate the tensor product and simply write (fT"3T ) in 
the matrix form 

V = W-S(V)+X 

AW ,1 ■ (14) 



dt 



e(-S(V)-S(V) T -^W(t)). 



where S(V) G M iVxM such that [S(V)] ia = s(V/ a) ). A second application of 
Tikhonov's theorem (in the reverse direction) then establishes that solutions of the 
system E (written in the matrix form dHJ) are close to solutions of the matrix 
system 

^ = -V + W -s{v)+x 

(15) 

dW 



dt {^s { v).s(vr-,w(t)] 

In the remainder of the paper we will focus on the system E' whose solutions 
are close to those of the original system E provided condition (fTTb is satisfied, 
i.e. the network is weakly connected. Clearly, the fixed points (V*, W*) of sys- 
tem E satisfy ||W*|| < ^ nL . Therefore, equation (HB says that if < 1 then 
Tikhonov's theorem can be applied and systems E and E' can be reasonably con- 
sidered as good approximations of each other. The advantage of the averaged sys- 
tem E' is that is given by autonomous ordinary differential equations. Moreover, 
since it is Lipschitz continuous, it leads to a well-posed Cauchy problem. Finally, 
note that it is straighforward to extend our approach to time -functional rules (e.g. 
sliding threshold or BCM rules as described in QBienenstock et al., 19 82]) which, 
in this new framework, would be approximated by simple ordinary differential 
equations (as opposed to time-functional differential equations) provided S is re- 
defined appropriately. 
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Figure 1: Percentage of error between final connectivities for the exact and aver- 
aged system. 



3.3 Simulations 

To illustrate the above approximation, we simulate a simple network with both 
exact, i.e. E, and averaged ,i.e. evolution equations. For these simulations, 
the network consists of N = 10 fully-connected neurons and is presented with 
M = 10 different random inputs taken uniformly in the intervals [0, 1]^. For this 
simulation we use s(x) = 1+e _\ {x _ 1} , and ji — 10. Figured] shows the percentage 
of error between final connectivities for different values of e and T/M. Figure |2] 
shows the temporal evolution of the norm of the connectivity for both the exact 
and averaged system for T = 10 3 and e = 10~ 3 . 



4 Stability 

4.1 Liapunov function 

In the case of a single fixed input (M = 1), the systems £ and £' are equivalent 
and reduce to the neural network with adapting synapses previously analyzed by 



| Dong and Hopfield, 1992 1. Under the additional constraint that the weights are 
symmetric (Wij = Wji), these authors showed that the simultaneous evolution of 
the neuronal activity variables and the synaptic weights can be re-expressed as a 
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Figure 2: Temporal evolution of the norm of the connectivities of the exact system 
X and averaged system 



gradient dynamical system that minimizes a Liapunov or energy function of state. 
We can generalize their analysis to the case of multiple inputs (M > 1) and non- 
symmetric weights using the averaged system That is, following along similar 
lines to [|Dong and Hopfield, 1992[|, we introduce the energy function 



1 



E(U,W) = --(U,W-U)- (T,U) + (l,S-i(U)) 
where U = S{V), \\W\\ 2 = (W, W) = £^ Wg, 

M N 

(u,w-u) = J2Y, u} a) WijUj a \ (X, U) 



Mfj, 



M N 



W\ 



a=l i=l 



a=l i=l 



and 



M N fU ( a ) 



(16) 



(17) 



(18) 



In contrast to [Dong and Hopfield, 1992 1, we do not require a priori that the 
weight matrix is symmetric. However, it can be shown that the system always 

converges to a symmetric connectivity pattern. More precisely, A = j (V, W) G 



jJVxM 



iNxN 



W = W | is an attractor of the system E'. A proof can be 
found in appendix 18. II It can then be shown that on A (symmetric weights), E is 
a Liapunov function of the dynamical system £', that is, 



dE 
~~dt ~ 



dE 

and — = 

dt 



y = {v,w) 



T 
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The boundedness of E and the Krasovskii-LaSalle invariance principle then im- 
plies that the system converges to an equilibrium QKhalil and Grizzle, 1996] . We 
thus have 

Theorem 4.1. The initial value problem for the system £' with 3^(0) G H, con- 
verges to an equilibrium state. 

Proof. See appendix W2\ □ 
It follows that neither oscillatory nor chaotic attractor dynamics can occur. 

4.2 Linear stability 

Although we have shown that there are stable fixed points, not all of the fixed 
points are stable. However, we can apply a linear stability analysis on the system 
E' to derive a simple sufficient condition for a fixed point to be stable. The method 
we use in the proof could be extended to more complex rules. The proof reveals 
the significant role played by the Kronecker product in Hebbian learning. 



Theorem 4.2. The equilibria of system £' satisfy: 

(19) 



V* = j^jSiV*) ■ S(V*) T ■ S{V*)+X 

w * = ia?<*(v*) • s(v*) T 

and a sufficient condition for stability is 

35^11^*11 < 1 (20) 
provided 1 > e/x which is most probably the case since e « 1. 
Proof. See appendix IQ1 □ 



This condition is strikingly similar to that derived in [ Faugeras et al., 2008] . In 



fact, condition (|20|) is stronger than the contracting condition (fTD) . It says the 
network may converge to a weakly connected situation. It justifies the averag- 
ing method by saying that we remain in the domain of validity of the averaging 
method. It also says that the dynamics of V is likely (because the condition is 
only sufficient) to be contracting and therefore subject to no bifurcations: a fully 
recurrent learning neural network is likely to have a "simple" dynamics. 
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5 Geometrical structure of equilibrium points 



5.1 Learning the correlation matrix of the inputs 

It follows from equation (fT9l ) that the equilibrium weight matrix W* is given by 
the correlation matrix of the firing rates. Moreover, in the case of sufficiently 
large inputs, the matrix of equilibrium membrane potentials satisfies V* ~ X. 
More precisely, if |S(4 a) ) | < | if } | for all a = 1, . . . , M and i = 1, . . . , N, then 
we can generate an iterative solution for V* of the form 

V* =1+-S(1) ■ S(X) T -S(l) +h.o.t. 

On the other hand, if the inputs are comparable in size to the synaptic weights, 
then there is no explicit solution for V*. Roughly speaking, we observe that the 
connection term has the role of "smoothing" the solution. Therefore, if a Gaussian 
is presented to the network (as the only input), the membrane potential is likely 
to be another Gaussian with a larger variance. If no input is presented to the 
network (/ = 0), then S(0) ^ implies that the activity is non-zero, that is, there 
is spontaneous activity. Combining these observations, we see that the network 
roughly extracts and stores the correlation matrix of the strongest inputs within 
the weights of the network. 



5.2 From a symmetric connectivity matrix to a convolutional 
network 

So far neurons have been identified by a label i E {L.iV}; there is no notion 
of geometry or space in the preceding results. However, as we show below, the 
inputs may contain a spatial structure that can be encoded by the connectivity. In 
this section, we propose a mechanism to unveil the hidden geometrical structure 
within the connectivity. More specifically, we want to find an integer k E N and N 
points in IR fc , denoted by Xi, i E {1, . . . , N}, so that the connectivity can roughly 
be written as W*j ~ exp(— \\xi — Xj\\ 2 ). In other words, we interpret the final 
connectivity as a matrix describing the distance between the neurons living in a 
k-dimensional space. However, W* is not always a distance matrix, therefore, it 
is natural to project it on the set of distance matrices. Finding the best fit of W* to 
a distance matrix is usually called multidimensional scaling. This set of methods 
is reviewed in ptorg and Groenen, 2005) . 
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First, define W G R NxN as W* whose diagonal terms are set to W^ ax the 
largest component of W*\ W*j = NijWij with iV^ = 1 if i ^ j and Nu = 
Wu/W^ax- Second, define a bijective kernel function on i G 1 + such that 
K{x) = W^^e - x l a ' 1 . Given that W is non-negative, we define the matrix D = 
K^iW) corresponding to the application of the inverse of K to each component 
of W. As said before, we want to find k G N and Xi G M. k for i G {1, . . . , N} 
so that D is a distance matrix, £)y = \\xi — Xj\\ 2 . In general, this is not possible. 
However, we can compute the projection of the symmetric matrix W onto the set 
of distance matrices by applying multidimensional scaling methods as described 
in [ Borg and Groenen, 2005] . We use the stress majorization or SMACOF algo- 



rithm for the stress 1 cost function throughout the article. In other words, we can 
find the distance matrix D± such that ||D n || = \\D — D±\\ is minimal. Therefore, 
Wij = K(D ll ) ij K(D ± ) ij . Define M such that M(x h Xj) = K(D n )ij Ny and G, 
a Gaussian with a standard deviation equal to a. In spatial coordinates 



W*{xi, Xj ) = M(x hXj ) G a {\\ Xi - Xj \\) (21) 

Multidimensional scaling methods consist in minimizing the contribution of M in 
the preceding equation. Hence, we refer to M as the non-convolutional connec- 
tivity. 

A position Xi G M fc is associated to each neuron % G {1, . . . , N} such that 

N N 

(w ■ s(v)). = j2 w ?A v i) = E M (^ x i) G °(W x * - x i\\) s ^ x ^ w 

3=1 j=l 

In particular, we can assume that k is large enough for ||-DJ to be very small. 
Moreover, if the neurons are equally excited on average (i.e. the diagonal of W* 
is already equal to W^.Id), then it is reasonable to consider that M(x i7 Xj) = 1 
leading to the following convolutional product 

W • S(V) = G a (\\.\\) * S(V) 

Therefore, in the space defined by the Xi G M. k the connectivity is close to being 
convolutional. 



5.3 Unveiling the geometrical structure of the inputs 

We hypothesize that the space defined by the x,i reflects the underlying geometrical 
structure of the inputs. We have not found a way to prove this so we only provide 
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numerical examples that illustrate this claim. In the following examples, we relate 
the geometry of the manifold suggested by the x,i to the network inputs. Thus we 
feed the network with inputs having a defined geometrical structure and then show 
how this structure can be extracted from the connectivity by the method above. 
However, it is by extracting the structure from unknown inputs that these networks 
might reveal themselves useful. Therefore, the following is only a (numerical) 
proof of concept. 

We assume the inputs to be uniformly distributed over a manifold with fixed 
geometry. This strong assumption amounts to considering that the feedforward 
connectivity (which we do not consider here) has already properly filtered the 
information coming from the sensory organs. More precisely, define the set of 
inputs by the matrix X e M. NxM sucn that I± = f(\\yi — z a \\o) where the z a are 
uniformly distributed points over Vt, the are the positions on Vt that "label" the 
ith neuron, and / is a decreasing function on R + . The norm is the natural 

norm defined over the manifold Vt. For simplicity, assume f(x) = f a (x) = Ae ~^ 
so that the inputs are localized bell-shaped bumps on the shape f2. 

5.3.1 Planar retinotopy 




Figure 3: Plot of planar retinotopic inputs on Q = [0, 1] x [0, 1] (left) and final con- 
nectivity matrix of the system S' (right). The parameters used for this simulation 

are s(x) = e _ 1 4( ,_ 1) , I = 1, // = 10, e = 0.001, N = M = 100, a = 4. 
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We consider a set of Gaussian inputs uniformly distributed over a two-dimensional 
plane, e.g. f2 = [0, 1] x [0, 1]. For simplicity, we take N = M = K 2 and set 
z a = Vi for i = a, a E {1, . . . , M}. (The numerical results show an identical 
structure for the final connectivity when the yj correspond to random points, but 
the analysis is harder). In the simpler case of one-dimensional Gaussians with 
N = M = K, the input matrix takes the form X = Tf a , where Tf is a symmetric 
Toeplitz matrix: 
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In the two-dimensional case, we set y = (u,v) G fl and introduce the labeling 
Vk+(i-i)K = (uk,vi) for k,l = 1,...K. It follows that If' ~ exp(-(u fc - 
u fc /) 2 ) exp(-(vj - fr) 2 ) for i = A; + (Z - l)i^ and a = fc' + (/' - Hence, 
we can write X = X) ct g) T^, where <8> is the Kronecker product; the Kronecker 
product is responsible for the K x K sub-structure we can observe in figure |3]with 
K = 10. Note that if we were interested in a n-dimensional retinotopy, then the 
input matrix could be written as a Kronecker product between n Toeplitz matrices. 
As previously mentioned, the final connectivity matrix roughly corresponds to the 
correlation matrix of the input matrix. It turns out that the correlation matrix of X 
is also a Kronecker product of two Toeplitz matrix generated by a single Gaussian 
(with a different standard deviation). Thus, the connectivity matrix has the same 
basic form as the input matrix when z a = yi for i = a. The inputs and stable 
equilibrium points of the simulated system are shown in figure [3l The positions 
Xi of the neurons after multidimensional scaling are shown in figure |4] 



5.3.2 Toroidal retinotopy 

We now assume that the inputs are uniformly distributed over a two-dimensional 
torus, i.e. f2 = T 2 . That is, the input labels z a are randomly distributed on the 
torus. The neuron labels yi are regularly and uniformly distributed on the torus. 
The inputs and final stable weight matrix of the simulated system are shown in 
figure |5l The positions Xi of the neurons after multidimensional scaling for k = 3 
are shown in figure [6] and appear to form a cloud of points distributed on a torus. 
In order to confirm this, we have used a numerical method from computational co- 
homology flZomorodian and Carlsson, 2005] to construct the so-called persistent 
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Figure 4: Positions Xi of the neurons after having applied classical multidimen- 
sional scaling to the final connectivity matrix shown in figure [3] for k = 2 (left) 
and k = 3 (right). The regular spacing of the neurons for k = 2 shows that 
the planar structure of the inputs has been recovered, although the comer of the 
square appear less regular due to boundary effects. In the case k = 3, there is an 
embedding of the plane into three dimensions; the saddle-like shape accounts for 
the corner irregularity observed when k = 2. 




Final connectivity W^* j 



0.0| I 




Figure 5: Plot of retinotopic inputs on VL = T 2 (left) and the final connectivity 
matrix (right) for the system The parameters used for this simulation are 

s (x) = Y^k^T) . / = 1, /i = 10, e = 0.001, N = 1000, M = 10, 000, a = 2. 
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Figure 6: Left: Positions xi of the neurons for k — 3 after having applied multi- 
dimensional scaling methods presented in part l5.2l to the final connectivity matrix 
shown in figured Right: Persistent cohomology barcodes of the cloud of points Xi 



computed using the Jplex software package of [Sexton and Vej demo- Johansson, 



(See section [8~4l for a short introduction to persistent cohomology). The triplet of 
betti numbers (1,2,1) appear stable confirming that the points lie on a 2 dimen- 
sional torus. 
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cohomology barcodes of the neurons' positions. These determine certain topolog- 
ical invariants of the underlying space. (See section [8T41 for a short introduction 
to persistent cohomology and barcodes). The results are also shown in figure [6] 
and establish that the network has learnt the underlying toroidal geometry of the 
inputs. 



5.4 Links with neuroanatomy 

The brain is subject to energy constraints which are completely neglected in the 
above formulation. These constraints most likely have a significant impact on the 
positions of real neurons in the brain. Indeed, it seems reasonable to assume that 
the positions and connections of neurons reflect a trade-off between the energy 
costs of biological tissue and their need to process information effectively. For 
instance, it has been suggested that a principle of wire length minimization may 
occur in the brain QSwindale, 1996| |Chklovskii et al., 2002| . In our neural mass 



framework, one may consider that the stronger two neural mas ses are connected, 
the larger the number of real axons linking the neurons together. Therefore, mini- 
mizing axonal length can be read as: the stronger the connection the closer, which 
is consistent with the convolutional part of the weight matrix. However, the un- 
derlying geometry of natural inputs is likely to be very high-dimensional, whereas 
the brain lies in a three-dimensional world. In fact, the cortex is so flat that it is 
effectively two-dimensional. Hence, the positions of real neurons are different 
from the positions %i E IR fc in a high dimensional vector space; since the cor- 
tex is roughly two-dimensional, the positions could only be realized physically if 
k = 2. Therefore, the three-dimensional toric geometry or any higher dimensional 
structure could not be perfectly implemented in the cortex without the help of non- 
convolutional long-range connections. Indeed, we suggest that the cortical con- 
nectivity is made of two parts: i) a local convolutional connectivity corresponding 
to the convolutional term G a in (12TI) . which is consistent with the requirements of 
energy efficiency, and ii) a non-convolutional connectivity corresponding to the 
factor M in equation (12TI) . which is required in order to represent various stimulus 
features. If the cortex were higher-dimensional (k 2) then M = 1. 

We illustrate the above claim by considering two examples based on the func- 
tional anatomy of the primary visual cortex: the emergence of ocular dominance 
columns and orientation columns, respectively. We proceed by returning to the 
case of planar retinotopy (section 5.3.1) but now with additional input structure. 
In the first case, the inputs are taken to be binocular and isotropic, whereas in 
the second case they are taken to be monocular and anisotropic. The details are 
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presented below. Given a set of prescribed inputs, the network evolves accord- 
ing to equation (fT51 ) and the lateral connections converge to a stable equilibrium. 
The resulting weight matrix is then projected onto the set of distance matrices for 
k = 2 (as described in section 15.21 ) using the stress majorization or SMACOF 
algorithm for the stress 1 cost function as described in [ jBorg and Groenen, 2005[ . 
We thus assign a position x« 6 M 2 to the ith neuron, % — 1, . . . , N. (Note that the 
position Xi extracted from the weights using multidimensional scaling is distinct 
from the "physical" position of the neuron in the retinocortical plane; the latter 
determines the center of its receptive field). The convolutional connectivity (Go- 
in equation [2D is therefore completely defined: on the planar map of points xi, 
neurons are isotropically connected to their neighbors; the closer the neurons are 
the stronger is their convolutional connection. Moreover, since the stimulus fea- 
ture preferences (orientation, ocular dominance) of each neuron i, i = 1, . . . , N, 
are prescribed, we can superimpose these feature preferences on to the planar 
map of points Xi. In both examples, we find that neurons with the same ocular 
or orientation selectivity tend to cluster together (see figures [7] and [8]): interpolat- 
ing these clusters then generates corresponding feature columns. It is important 
to emphasize that the retinocortical positions y; L do not have any columnar struc- 
ture, that is, they do not form clusters with similar feature preferences. Thus, in 
contrast to standard developmental models of vertical connections, the columnar 
structure emerges from the recurrent weights following Hebbian learning and an 
application of multidimensional scaling. It follows that neurons coding for the 
same feature tend to be strongly connected; indeed, the multidimensional scaling 
algorithm has the property that it positions strongly connected neurons close to- 
gether . Equation (12TI) also suggests that the connectivity has a non-convolutional 
part, M, which is a consequence of the low-dimensionality (k = 2). In order to 
illustrate the structure of the non-convolutional connectivity, we select a neuron i 
in the plane and draw a link from it at position Xj to the neurons at position Xj for 
which M(xi, Xj) is maximal. We find that M tends to be patchy, i.e. it connects 
neurons having the same feature preferences. In the case of orientation, M also 
tends to be co-aligned, i.e. connecting neurons with similar orientation preference 
along a vector in the plane of the same orientation. 



5.4.1 Ocular dominance columns and patchy connectivity 

In order to construct binocular inputs, we partition the N neurons into two sets 
i E {1, . . . , N/2} and i e {N/2 + 1, . . . , N} that code for the left and right eyes, 
respectively. The ith neuron is then given a retinocortical position j/j G [0,1] x 
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[0, 1], with the uniformly distributed across the plane. We do not assume a 
priori that there exist any ocular dominance columns, that is, neurons with similar 
retinocortical positions yi do not form clusters of cells coding for the same eye. 
We then take the ath input to the network to be of the form 

4 a) = (l + 7(a))e"^, i = l JV/2 

4 a) = (1 - 7 (a))e- Mz ^ £l! , i — N/2 + 1,...,N, 

where s E M 2 represents some form of binocular disparity, z a and 7(a) are 
randomly generated from [0, l] 2 and [—1,1], respectively, see QBressloff, 2005] , 
Thus, if 7(a) > (7(a) < 0) then the corresponding input is predominantly from 
the left (right) eye. In our simulations we take a — a' — 0.1 and s = 0.005. The 
results of our simulations are shown in figure [7] In particular, we plot the points Xi 
obtained by performing multidimensional scaling on the final connectivity matrix 
for k = 2, and superimposing upon this the ocular dominance map obtained by 
interpolating between clusters of neurons with the same eye preference. We also 
illustrate the non-convolutional connectivity by linking one selected neuron to the 
five neurons labeled j it is most strongly connected to (with M(xi, Xj) > 1), with 
i the label of the central neuron. This clearly shows that long-range connections 
tend to link cells with the same ocular dominance. 

5.4.2 Orientation columns and colinear connectivity 

In order to construct oriented inputs, we partition the iV neurons into four groups 
Eg corresponding to different orientation preferences 9 = {0, |, |, ^}. Thus, 
if neuron i E then its orientation preference is 9i = 9. For each group, the 
neurons are randomly assigned a retinocortical position y { E [0, 1] x [0,1]. Again, 
we do not assume a priori that there exist any orientation columns, that is, neurons 
with similar retinocortical positions i/i do not form clusters of cells coding for the 
same orientation preference. Each cortical input l[ a ^ is generated by convolving a 
thalamic input consisting of an oriented Gaussian with a Gabor-like receptive field 
flMiikkulainen et al., 2005] . Let TZq denote a 2-dimensional rigid body rotation in 
the plane with 9 E [0, 27r). Then 

4° } = / G^-yM^-Za)^, (24) 

where 

G t (0 = G o (Tl 0i O (25) 
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Figure 7: Plot of the positions Xj of neurons for k = 2 in red (right eye) and green 
(left eye). We have used an interpolation method to highlight the areas dominated 
by the right eye in black. These ocular dominance columns have fractal borders 
which are less regular than those observed in optical imaging experiments. The 
convolutional connectivity (G a in equation (|2TI) ) is implicitly described by the po- 
sition of the neurons: the closer the neurons, the stronger their connections. The 
strongest components of the non-convolutional connectivity (M in equation (|2TI) ) 
from a central red neuron are also shown by drawing links from this neuron to 
the target neurons. The color of the link refers to the color of the target neuron. 
Therefore, we see that it is mainly connected to neurons of its same ocular dom- 
inance resulting in a patchy distribution. The parameters used for this simulation 
are s(x) = , / = 1, /i = 10, e = 0.01, N = 800 M = 3200. 
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and Gq(0 is the Gabor-like function 



G (0 = A + e 



T .A- 1 .C 



r A-i.(£+e ) 



with e = (0, 1) and 



A 




& small 







) 



The amplitudes A + , are chosen so that J G (^)d^ = 0. Similarly, the thalamic 
input I a (0 = iCR-o'aO w ^ m AO me anisotropic Gaussian 



The input parameters 9' a and z a are randomly generated from [0, 7r) and [0, l] 2 
respectively. In our simulations we take <Ji arge = 0.133..., cr' large = 0.266... and 
o'smaii = v'smaii = 0.0333.... The results of our simulations are shown in figure |8] 
In particular, we plot the points %i obtained by performing multidimensional scal- 
ing on the final connectivity matrix for = 2, and superimposing upon this the 
orientation preference map obtained by interpolating between clusters of neurons 
with the same orientation preference. To avoid border problems we have zoomed 
on the center on the map. We also illustrate the non-convolutional connectiv- 
ity by linking one selected neuron to all other neurons for which M is maximal. 
The patchy, anisotropic nature of the long-range connections is clearly seen. The 
anisotropic nature of the connections is further quantified in the histogram of fig- 
ure El 



6 Discussion 

In this paper, we have shown how a neural network can learn the underlying geom- 
etry of a set of inputs. We have considered a fully recurrent neural network whose 
dynamics is described by a simple non-linear rate equation, together with unsu- 
pervised Hebbian learning with decay that occurs on a much slower time scale. 
Although several inputs are periodically presented to the network, so that the re- 
sulting dynamical system is non-autonomous, we have shown that such a system 
has a fairly simple dynamics: the network connectivity matrix always converges 
to an equilibrium point. We have then demonstrated how this connectivity matrix 
can be expressed as a distance matrix in ~R k for sufficiently large k, which can be 
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Figure 8: Plot of the positions x t of neurons for k = 2 obtained by multidi- 
mensional scaling of the weight matrix. Neurons are clustered in orientation 
columns represented by the colored areas, which are computed by interpolation. 
The strongest components of the non-convolutional connectivity (M in equation 
(1271) ) from a particular neuron in a yellow area are illustrated by drawing black 
links from this neuron to the target neurons. Since the yellow color corresponds 
to an orientation of ^f, the non-convolutional connectivity shows the existence 
of a co-linear connectivity as exposed in [Bosking et al., 1997 1. The parameters 
used for this simulation are s(x) = 1+e _4 (3; _ 1) , I — 1, ji — 10, e = 0.01, N = 900 
M = 9000. 
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Figure 9: Histogram of the 5 largest components of the non-convolutional connec- 
tivity for 80 neurons randomly chosen among those shown in Fig. [8] The abcissa 
corresponds to the difference in radian between the direction preference of the 
neuron and the direction of the links between the neuron and the target neurons. 
This histogram is weighted by the strength of the non-convolutional connectiv- 
ity. It shows a preference for co-aligned neurons but also a slight preference for 
perpendicularly-aligned neurons (e.g. neurons of the same orientation but parallel 
to each other). 
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related to the underlying geometrical structure of the inputs. If the connectivity 
matrix is embedded in a lower two-dimensional space (k = 2), then the emerg- 
ing patterns are similar to experimentally observed cortical feature maps. That 
is, neurons with the same feature preferences tend to cluster together forming 
cortical columns within the embedding space. Moreover, the recurrent weights 
decompose into a local isotropic convolutional part, which is consistent with the 
requirements of energy efficiency, and a longer-range non-convolutional part that 
is patchy. This suggest a new interpretation of the cortical maps: they correspond 
to two-dimensional embeddings of the underlying geometry of the inputs. 

One of the limitations of applying simple Hebbian learning to recurrent corti- 
cal connections is that it only takes into account excitatory connections, whereas 
20% of cortical neurons are inhibitory. Indeed, in most developmental models of 
feedforward connections, it is assumed that the local and convolutional connec- 
tions in cortex have a Mexican hat shape with negative (inhibitory) lobes for neu- 
rons that are sufficiently far from each other. From a computational perspective, it 
is possible to obtain such a weight distribution by replacing Hebbian learning with 
some form of covariance learning ([ Sejnowski and Tesauro, 1989[ ). However, it 
is difficult to prove convergence to a fixed point in the case of the covariance learn- 
ing rule, and multidimensional scaling method cannot be applied directly unless 
the Mexican hat function is truncated so that it is invertible. Another limitation 
of rate-based Hebbian learning is that it does not take into account causality, in 
contrast to more biologically detailed mechanisms such as spike timing dependent 
plasticity. 

The approach taken here is very different from standard treatments of cortical 
development flMiller et al., 1989l|Swindale, 1996] , in which the recurrent connec- 
tions are assumed to be fixed and of convolutional Mexican hat form whilst the 
feedforward vertical connections undergo some form of correlation-based Heb- 
bian learning. In the latter case, cortical feature maps form in the physical space 
of retinocortoical coordinates yi, rather than in the more abstract planar space 
of points Xi obtained by applying multidimensional scaling to recurrent weights 
undergoing Hebbian learning in the presence of fixed vertical connections. A par- 
ticular feature of cortical maps formed by modifiable feedforward connections is 
that the mean size of a column is determined by a Turing-like pattern forming in- 
stability, and depends on the length scales of the Mexican hat weight function and 
the two-point input correlations QMiller et al., 1989] |Swindale, 1996] . No such 
Turing mechanism exists in our approach so that the resulting cortical maps tend 
to be more fractal-like (many length scales) compared to real cortical maps. Nev- 
ertheless, we have established that the geometrical structure of cortical feature 
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maps can also be encoded by modifiable recurrent connections. This should have 
interesting consequences for models that consider the joint development of feed- 
forward and recurrent cortical connections. One possibility is that the embedding 
space of points Xi arising from multidimensional scaling of the weights becomes 
identified with the physical space of retinocortical positions j/j. The emergence of 
local convolutional structures together with sparser long-range connections would 
then be consistent with energy efficiency constraints in physical space. 

Our paper also draws a direct link between the recurrent connectivity of the 
network and the positions of neurons in some vector space such as M 2 . In other 
words, learning corresponds to moving neurons or nodes so that their final position 
will match the inputs' geometrical structure. Similarly, the Kohonen algorithm 
flKohonen, 1990 ] describes a way to move nodes according to the inputs presented 
to the network. It also converges toward the underlying geometry of the set of 
inputs. Although not formally equivalent, it seems that both of these approaches 
have the same qualitative behaviour. However, our method is more general in the 
sense that no neighborhood structure is assumed a priori; such a structure emerges 
via the embedding into M, k . 

Finally, note that we have used a discrete formalism based on a finite num- 
ber of neurons. However, the resulting convolutional structure obtained by ex- 
pressing the weight matrix as a distance matrix in M fc , equation (12TI) . allows us 
to take an appropriate continuum limit. This then generates a continuous neural 
field model in the form of an integro-differential equation whose integral kernel is 
given by the underlying weight distribution. Neural fields have been used increas- 
ingly to study large-scale cortical dynamics (see QCoombes, 2005] for a review). 
Our geometrical learning theory provides a developmental mechanism for the for- 
mation of these neural fields. One of the useful features of neural fields from a 
mathematical perspective is that many of the methods of partial differential equa- 
tions can be carried over. Indeed, for a general class of connectivity functions 
defined over continuous neural fields, a reaction-diffusion equation can be de- 
rived whose solution approximates the firing rate of the associated neural field 
JDegond and Mas-Gallic , 1989||Cottet, 1 995 1 [Edwards, 19961 . It appears that the 
necessary connectivity functions are precisely those that can be written in the 
form (I2TI) . This suggests that a network that has been trained on a set inputs with 
an appropriate geometrical structure behaves as a diffusion equation in a high- 
dimensional space together with a reaction term corresponding to the inputs. 
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8 Appendix 

8.1 Proof of the convergence to the symmetric attractor A 

We need to prove the 2 points: (i) A is an invariant set, and (ii) for all y(0) G 
r nxm x r nxn^ yy.j converges to A as +oo. Since R NxN is the direct sum 
of the set of symmetric connectivities and the set of anti-symmetric connectivies, 
we write W(t) = W s (t) + ^(i), Wt G M+, where W s is symetric and W A is 
anti-symetric. 

(i) In ( fT51) , the right hand side of the equation for W is symmetric. Therefore, 
if 3ti G R+ such that W A (ti) = 0, then W remains in A for t > t 1 . 

(ii) Projecting the expression for W in equation (TT5T) on to the anti- symmetric 
component leads to 

dW A 

—A = - ef iW A (t) (26) 
dt 

whose solutionis W A {t) = W A (0) exp(-e)t/t), Vt G R+. Therefore, lim W A (t) = 

t— >+oo 

0. The system converges exponentially to A. 



8.2 Proof of theorem 4.1 



Consider the following Lyapunov function (see equation ([TBI) ) 

E(U,W) = ~(U,W-U) - (I,U) + (l,S^(K)} + ^\\W\\ 2 , (27) 

where \x = fiM, such that if W — Ws + W A , where Ws is symmetric and W A is 
anti- symmetric. 
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Therefore, writing the system S', equation (fT51) , as 

dy _ fw s ■ s(v) + 1-S- 1 (s(v))\ fw A .s(v)\ 

dt 7 V S(V) ■ S(V) T - pw J +1 \ J' 
where y = (V, W) T , we see that 

^ = -j(vE(a(V,W)^j +T(t) (29) 

where 7 (V, 1^) T = (V, e^/M) T , a(V, W) = (£(V), W) and T : M+ -> U such 
that ||r|| — > exponentially (because the system converges to A). It follows 

t— >+oo 

that the time derivative of E = E o a along trajectories is given by: 

dE I ~ dy\ I ~ dV\ /„ ~ dW\ 



Substituting equation ( 1291 ) then yields 

dE 



x WE, 1 (VEoa)) + {\/E,T(t)) (31) 



r(t) 

S'(V)VuE o a, V w £ o - jj(^wE o a, V W E o + f(t). 

We have used the chain-rule of differentiation, whereby 

V V (E) = V V (E oa) = S'(V)V U E o a, 

and S'(V) VuE (without dots) denotes the Hadamard (term by term) product, that 
is, 

[S\V)V u E\ ia = J(V{ a) )—- 

oU, 



(a) 



Note that |T| — > exponentially because V E is bounded, and S'(V) > 

t— ¥ + 00 

because the trajectories are bounded. Thus, there exists t\ E M + such that V£ > ti, 
3k G R* + such that 

^ < -fcllV^oall 2 < 0. (32) 
at 

As in [ |Cohen and Grossberg, 1983) and [ pong and Hopfield, 1992) , we apply the 
Krasovskii-LaSalle invariance principle QKhalil and Grizzle, 1996] . We check that: 
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• E is lower bounded. Indeed, V and W are bounded. Given that X and S are 
also bounded it is clear that E is bounded. 

dE 

• — is negative semidefinite on the trajectories as shown in equation (|32l ). 

at 

Then the invariance principle tells us that the solutions of the system £' approach 
the set M = G H : ^CV) = °}- Equation © implies that M = |y G 

"H : VE o cr = 1 . Since ^ = —7 ^ V-E o a j and 7 7^ everywhere, M consists 
of the equilibrium points of the system. This completes the proof. 



8.3 Proof of theorem 4.2 



Denote the right-hand side of system E', equation (IT5T ) by 

r -v + w ■ s(v) +1 

F ^ W) = \^(S(V).S(Vf-»MW) 

The fixed points satisfy the condition F(V, W) = which immediately leads to 
equations (fT9l ). Let us now check the linear stability of this system. The differen- 
tial of F at V*, W* is 

/ —Z + w* ■ (s'(y*)z) + j ■ siy*) 

dF (y * >w * ) (Z,J)= yj_^ s ,(y,} Z j . S (V*) T + S(V*) ■ (S'(V*)Z) T - flMJ 

where S'(V*)Z denotes a Hadamard product, that is, [S'(V*)Z} ia = s'(V* {a) )zl a) . 
Assume that there exist A G C*, (Z,J) G H such that dF(y*^ w ^ f jj — 

A ( T J . Taking the second component of this equation and computing the dot 



product with S(V*) leads to 

(A + en) J - S =J7 (i s ' z ) -S T -S + S- (S'Zf ■ S) 

where S = ^(V*), S' = S'(V*). Substituting this expression in the first equation 
leads to 

M(X + en)(X + l)Z = (-+e)S-S T -(S'Z) + e(S'Z)-S T -S + eS-(S'Z) T -S 

n 

(33) 
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Observe that setting e = in the previous equation leads to an eigenvalue 
equation for the membrane potential only: 

(X + 1)Z = ^-S ■ S T ■ (S'Z). 

Since W * = -jj{S ■ S T ) , this equation implies that A + 1 is an eigenvalue of 
the operator X W*.(S'X). The magnitudes of the eigenvalues are always 
smaller than the norm of the operator. Therefore, we can say that if 1 > \\W* \\S' m 
then all the possible eigenvalues A must have a negative real part. This sufficient 



condition for stability is the same as in [Faugeras et al., 2008 1. It says that fixed 



points sufficiently close to the origin are always stable. 

Let us now consider the case e / 0. Recall that Z is a matrix. We now 
"flatten" Z by storing its rows in a vector called Z row . We use the following result 
in [Brewer, 1978] : the matrix notation of operator X H- A-X-B is A®B T , where 



® is the Kronecker product. In this formalism the previous equation becomes 

M(\+efi)(\+l)Z row = ^+e)S-S T ®I d +eI d ®S T -S+eS0S T y(S'Z) mw 

(34) 

where we assume that the Kronecker product has the priority over the dot product. 
We focus on the linear operator O defined by the right hand side and bound its 
norm. Note that we use the following norm || W|L = sup x ^ which is equal 
to the largest magnitude of the eigenvalues of W . 

OIL < SL ( 1-1 \\S ■ S T ® I d \\oc + e\\S ■ S T ® /J oo + e\\I d ® S T ■ S1L 



+ e|LS , ®S T |L ). (35) 



Define, v m to be the magnitude of the largest eigenvalue of W* = -jj(S ■ S T ). 
First, note that S ■ S T and S T ■ S have the same eigenvalues (fxM)^ but different 
eigenvectors denoted by itj for S ■ S T and v t for S T ■ S. In the basis set spanned by 
the Ui g) Vj, we find that S ■ S T ® I d and I d ® S T ■ S are diagonal with (/jM)^ as 
eigenvalues. Therefore, ||5'-5' T ®/ rf || 0O = {\xM)v m and \\I d ^)S T -S\\ OQ = (fiM)is m . 
Moreover, observe that 

(S T ®S) T -(S T ^S)-(u i (^v j ) = (S-S T - Ui )®(S T -S-Vj) = (/iM) 2 v i v j u i ®v j (36) 
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Therefore, (S T <g> S) T ■ (S T ® 5) = (/iM) 2 diagtV^). In other words, S T ® 5 
is the composition of an orthogonal operator (i.e. an isometry) and a diagonal 
matrix. Immediately, it follows that \\S T (g> 511 < (fxM)u m . 
Compute the norm of equation (134T ) 



| (A + efx){\ + 1)| < S' m (\X\ + 3e^)z/ m . (37) 

Define f e : C ->• R such that / e (A) = |(A + e//)||(A + 1)| - (|A| + 3efx)S' m v m . 
We want to find a condition such that / e (C+) > 0, where C + is the right half 
complex plane. This condition on e, fx, v m , and S' m will be a sufficient condition 
for linear stability. Indeed, under this condition we can show that only eigenvalues 
with a negative real part can meet the necessary condition d37l) . Complex number 
of the right half plane cannot be eigenvalues and thus the system is stable. The 
case e = tells us that /o(C+) > if 1 > S' m u m , compute 

If 1 > efx, which is most probably true given that e << 1, then fT^^i — ^- 
Assuming A £ C + leads to: 

— -(A) > fx(fxe - 3S' m u m ) > fx(l - 3S' m v m ) 

0€ 

Therefore, the condition 3S' m v m < 1, which implies S' m v m < 1, and leads to 

fe(C + ) > 0. 

8.4 A very short introduction to computational cohomology 

In algebraic topology, topological spaces (which are continuous objects) can be 
classified by roughly counting their number of holes. This coordinate-invariant 
description of a topological space is called its homology (or cohomology, the 
difference between them is beyond the scope of this paper). In fact the homology 
can be summarized by giving the betti numbers of the topological state. The 
sequence of betti number is made of positive integers. The first three betti numbers 
have the following definition: the first is the number of connected components, 
the second is the number of two-dimensional or "circular" holes and the third is 



the number of 3-dimensional holes or "voids". See QHatcher, 2002] for a more 
rigorous approach. 
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However, in the example of toroidal retinotopy (see section 5.3.2 and figure |6]). 
we are dealing with a discrete cloud of points. Therefore, one needs to extend the 
definition of the betti numbers to discrete objects in order to find the underly- 
ing topology of the space within which the points are distributed. This is called 
computational or persistent cohomology. One reconstructs the topological space 
by considering balls of a given radius centered on each point in the cloud. For 
each radius (the abscissa of the right picture of figure |6), one can compute the 
betti numbers of the resulting topological space. A barcode graph, e.g. the right 
picture of figure |6] is constructed by drawing a horizontal bar for each connected 
component, 2-dimensional hole or 3-dimenional void etc. This is done for a range 
of radii. Finding the persistent cohomology of a cloud of points consists in ob- 
serving the set of betti numbers that are stable through a significantly large range 
of radii. One then assumes that the points most likely lie on a topological space 
of a given homology if the corresponding betti numbers are stable enough. See 
[Zomorodian and Carlsson, 2005) for details. 

In section |53^2l we used the Jplex software package of [ |Sexton and Vejdemo- Johansson, 
to compute the barcodes of the points corresponding to learning from inputs uni- 
formly distributed over a 2-dimensional torus. We used 200 landmarks spread 
according to the maxminlandmark method to build simplices in Jplex, which re- 
turned the maximum radius (beyond which all the betti numbers except the first 
vanish) we used in the simulation. In figure |6l we see that for a wide range of radii 
the triplet (1, 2, 1) is stable. This corresponds to a 2 dimensional torus. 
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